!pr2
68000 Sieve Benchmark.......................Peter J. McInerney
                                                   New Zealand

Here are two versions of the Sieve of Eratosthenes for the MC68000.  They provide ample justification for the power claimed for this chip.

The first version is a fairly straightforward translation of the algorithm as presented in the November 1982 AAL, by Tony Brightwell.  Tony's best time in the 6502 was 183 seconds for 1000 repetitions; in my 12.5 MHz DTACK GROUNDED attached processor, 1000 repetitions took only 40 seconds.

Compare the 68000 code with the 6502 code, and I'm sure you will agree the 68000 version is much easier to understand.  Note the use of long instructions in the array clearing loop and the two-dimensional indexing in lines 1230 and 1310.  Other nice things are the shift left by 3 (multiply by 8) in line 1270 and the decrement & branch instructions in lines 1120 and 1400.  Also very useful is the postincrement address mode, which automatically increments the address kept in the referenced register by 1, 2, or 4 depending on the size of the operation.  This is used for popping off (downward growing) stacks or as here to advance through memory.  There is also a predecrement mode but I did not use it in these example programs.

The second version uses a modified algorithm.  The changes I made should apply to the 6502 version also, improving it in about the same proportion.

!lm+3
!pp-3
*  Since we are ignoring even numbers, we may as well leave them out of the array entirely, thus halving the array size.

*  We can therefore simplify the formula for odd squares from S*8+1 to S*4.

*  We can even do away with the *4 part by adding 4 each time rather than 1.

*  The initial array clearing loop can be made faster by using more than one CLR instruction per loop.
!lm-3
!pp0

This modified version does 1000 iterations in only 33 seconds!  It is only slightly harder to follow than the first version, and only slightly larger.  In fact, if we forego the final modification above, the code is actually shorter.  I think most of the speedup comes from halving the array size.

If you have a Macintosh, and can manage to load machine code into it, you should find everything running about half as fast as my DTACK GROUNDED board.

[ We tried the program on our QWERTY Q-68 board, and it took roughly 10 times as long as Peter's DG board.  Understandable, since it was using Apple memory at .5MHz rate for all work.  (Bill&Bob) ]
